Aggregating and Predicting Sequence Labels from Crowd Annotations

نویسندگان

  • An Thanh Nguyen
  • Byron C. Wallace
  • Junyi Jessy Li
  • Ani Nenkova
  • Matthew Lease
چکیده

Despite sequences being core to NLP, scant work has considered how to handle noisy sequence labels from multiple annotators for the same text. Given such annotations, we consider two complementary tasks: (1) aggregating sequential crowd labels to infer a best single set of consensus annotations; and (2) using crowd annotations as training data for a model that can predict sequences in unannotated text. For aggregation, we propose a novel Hidden Markov Model variant. To predict sequences in unannotated text, we propose a neural approach using Long Short Term Memory. We evaluate a suite of methods across two different applications and text genres: Named-Entity Recognition in news articles and Information Extraction from biomedical abstracts. Results show improvement over strong baselines. Our source code and data are available online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Aggregating Crowd Wisdoms with Label-aware Autoencoders

Aggregating crowd wisdoms takes multiple labels from various sources and infers true labels for objects. Recent research work makes progress by learning source credibility from data and roughly form three kinds of modeling frameworks: weighted majority voting, trust propagation, and generative models. In this paper, we propose a novel framework named Label-Aware Autoencoders (LAA) to aggregate ...

متن کامل

Modeling annotator behaviors for crowd labeling

Machine learning applications can benefit greatly from vast amounts of data, provided that reliable labels are available. Mobilizing crowds to annotate the unlabeled data is a common solution. Although the labels provided by the crowd are subjective and noisy, the wisdom of crowds can be captured by a variety of techniques. Finding the mean or the median of a sample’s annotations are widely use...

متن کامل

Adversarial Learning for Chinese NER from Crowd Annotations

To quickly obtain new labeled data, we can choose crowdsourcing as an alternative way at lower cost in a short time. But as an exchange, crowd annotations from non-experts may be of lower quality than those from experts. In this paper, we propose an approach to performing crowd annotation learning for Chinese Named Entity Recognition (NER) to make full use of the noisy sequence labels from mult...

متن کامل

Effectively Crowdsourcing Radiology Report Annotations

Crowdsourcing platforms are a popular choice for researchers to gather text annotations quickly at scale. We investigate whether crowdsourced annotations are useful when the labeling task requires medical domain knowledge. Comparing a sentence classification model trained with expert-annotated sentences to the same model trained on crowd-labeled sentences, we find the crowdsourced training data...

متن کامل

Crowd behavior representation: an attribute-based approach

In crowd behavior studies, a model of crowd behavior needs to be trained using the information extracted from video sequences. Most of the previous methods are based on low-level visual features because there are only crowd behavior labels available as ground-truth information in crowd datasets. However, there is a huge semantic gap between low-level motion/appearance features and high-level co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proceedings of the conference. Association for Computational Linguistics. Meeting

دوره 2017  شماره 

صفحات  -

تاریخ انتشار 2017